Skip to content

Optimize server-setup, by offloading large matrix multiplication to GPU#11

Merged
itzmeanjan merged 29 commits intomainfrom
integrate-mat-mul-on-gpu
Apr 6, 2025
Merged

Optimize server-setup, by offloading large matrix multiplication to GPU#11
itzmeanjan merged 29 commits intomainfrom
integrate-mat-mul-on-gpu

Conversation

@itzmeanjan
Copy link
Copy Markdown
Owner

Use vulkan compute shaders to offload large matrix multiplication and matrix transposition to GPU
(feature-gated by non-default gpu feature), for speeding up server-setup phase of ChalametPIR.

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
… it finishes

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
…queue

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
… buffer creation

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
…tion

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
… function

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
…spond`

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
@itzmeanjan
Copy link
Copy Markdown
Owner Author

Without gpu feature, server-setup cost on Intel i7-1260P CPU

$ cargo bench --features mutate_internal_client_state --profile optimized --bench offline_phase -q server_setup
Timer precision: 10 ns
offline_phase                                                                        fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ server_setup                                                                                    │               │               │               │         │
   ├─ 3                                                                                            │               │               │               │         │
   │  ╰─ DBConfig { db_entry_count: 65536, key_byte_len: 32, value_byte_len: 1024 }  2.522 m       │ 2.648 m       │ 2.585 m       │ 2.585 m       │ 2       │ 2
   ╰─ 4                                                                                            │               │               │               │         │
      ╰─ DBConfig { db_entry_count: 65536, key_byte_len: 32, value_byte_len: 1024 }  2.535 m       │ 2.552 m       │ 2.543 m       │ 2.543 m       │ 2       │ 2

When enabled the gpu feature, server-setup is ~12.45x faster 🚀

$ cargo bench --features mutate_internal_client_state,gpu --profile optimized --bench offline_phase -q server_setup
Timer precision: 10 ns
offline_phase                                                                        fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ server_setup                                                                                    │               │               │               │         │
   ├─ 3                                                                                            │               │               │               │         │
   │  ╰─ DBConfig { db_entry_count: 65536, key_byte_len: 32, value_byte_len: 1024 }  12.18 s       │ 12.69 s       │ 12.45 s       │ 12.46 s       │ 25      │ 25
   ╰─ 4                                                                                            │               │               │               │         │
      ╰─ DBConfig { db_entry_count: 65536, key_byte_len: 32, value_byte_len: 1024 }  11.73 s       │ 12.24 s       │ 11.86 s       │ 11.87 s       │ 26      │ 26

Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
Signed-off-by: Anjan Roy <hello@itzmeanjan.in>
@itzmeanjan
Copy link
Copy Markdown
Owner Author

I benchmarked server-setup on AWS EC2 instance g6e.8xlarge, featuring Nvidia L40S tensor core GPUs.

Server-setup on CPU

server-setup-on-cpu

Server-setup, partially offloaded to GPU

server-setup-on-gpu

Note

Server-setup can be offloaded to GPU, by enabling feature gpu. You need to install Vulkan drivers and library for this feature to work.

@itzmeanjan itzmeanjan merged commit 0646d4e into main Apr 6, 2025
5 checks passed
@itzmeanjan itzmeanjan deleted the integrate-mat-mul-on-gpu branch April 6, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant